Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor container chain spawning logic to not need require embeded node outside of collation #627

Merged
merged 13 commits into from
Jul 23, 2024

Conversation

nanocryk
Copy link
Contributor

Depends on moondance-labs/dancekit#26

Currently the container chain spawning logic requires an embeded orchestrator client, which prevents using it in data preservers nodes that may interact with the orchestrator through a WebSocket connection. The linked PR modifies the OrchestratorChainInterface to add functions to get necessary data without the need for a client. When collating a client is still needed, as a data preserver node will not collate on the orchestrator chain.

@@ -796,7 +810,7 @@ fn handle_update_assignment_state_change(
/// interrupted before it finished downloading the state, in that case the node will use warp sync.
/// If it was interrupted during the block history download, the node will use full sync but also
/// finish the block history download in the background, even if sync mode is set to full sync.
fn select_sync_mode(
pub fn select_sync_mode_using_client(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this function we only need the orchestrator_client to check an edge case for when the container chain is still at block 0. We could also use the relay_chain_interface and get the latest finalized block from there, would that be better for the ws node?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we could yes, in the end we are fetching that information from the relay itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which function in the relay interface should be called? Indeed that would allow to get rid of the generic and not need the orchestrator client.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably paras->heads, but you would need to parse the header to get the block number. There are some utility functions in pallet_author_noting, such as author_from_log. @girazoki any better ideas? Maybe there is some other storage that only exists after the parachain has included its first block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we tackle that in a separate PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes let's do it as a separate PR. But it would definitely make it more compatible with solo-chains

Cargo.toml Outdated Show resolved Hide resolved
@nanocryk nanocryk added a0-pleasereview Pull request needs code review. B5-clientnoteworthy Changes should be mentioned in any downstream projects' release notes B7-runtimenoteworthy Changes should be noted in any runtime-upgrade release notes D3-trivial PR contains trivial changes in a runtime directory that do not require an audit labels Jul 17, 2024
Copy link
Contributor

github-actions bot commented Jul 17, 2024

Coverage Report

(master)

@@                              Coverage Diff                               @@
##           master   jeremy-orchestrator-spawner-without-client      +/-   ##
==============================================================================
- Coverage   67.32%                                       66.88%   -0.44%     
  Files         255                                          255              
+ Lines       44448                                        44587     +139     
==============================================================================
- Hits        29922                                        29819     -103     
+ Misses      14526                                        14768     +242     
Files Changed Coverage
/node/src/chain_spec/dancebox.rs 94.82% (+0.02%) 🔼
/node/src/chain_spec/flashbox.rs 70.87% (+0.12%) 🔼
/node/src/service.rs 19.97% (-0.88%) 🔽
/pallets/registrar/src/lib.rs 88.82% (-0.04%) 🔽
/runtime/dancebox/tests/common/mod.rs 98.63% (-0.01%) 🔽
/runtime/flashbox/tests/common/mod.rs 95.96% (-0.05%) 🔽
/solo-chains/runtime/starlight/tests/common/mod.rs 90.31% (-0.17%) 🔽

Coverage generated Mon Jul 22 08:32:43 UTC 2024

@@ -49,7 +49,7 @@ yargs(hideBin(process.argv))
}
process.stdout.write(`Done ✅\n`);
const onChainGenesisData = await api.createType(
"TpContainerChainGenesisDataContainerChainGenesisData",
"DpContainerChainGenesisDataContainerChainGenesisData",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remember to mention this in breaking changes, this will probably break all the registration scripts we have

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that uses api.createType("Tp... instead of Dp will stop working?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes but I dont think we use it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe the dapp does

@nanocryk nanocryk added the breaking Needs to be mentioned in breaking changes label Jul 17, 2024
@@ -235,6 +246,8 @@ async fn try_spawn(
);

if !start_collation {
collation_params = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot why we used both validator && start_collation instead of just one boolean, here is a summary:

The reason start_collation exists as a separate arg to validator is that in the old days instead of restarting the container chain node we used collate_on to sent a message to enable the collator. So we would have to pass collation_params set to Some and start_collation set to false, and then use the collate_on closure to start collation later.

Now we don't have that collate_on, so just setting to None here is ok. Although I'm considering bringing collate_on back, for parathreads, but that's an issue for the future.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mayeb we should add a comment for this then

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that in the next refactor

@@ -70,11 +69,21 @@ const MAX_DB_RESTART_TIMEOUT: Duration = Duration::from_secs(60);
/// Assuming a syncing speed of 100 blocks per second, this will take 5 minutes to sync.
const MAX_BLOCK_DIFF_FOR_FULL_SYNC: u32 = 30_000;

pub trait TSelectSyncMode:
Send + Sync + Clone + 'static + (Fn(bool, ParaId) -> sc_service::error::Result<SyncMode>)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using Fn() as a trait bound is very ugly, promise to remove this trait as soon as possible in a future PR

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha, as we discussed we should be able to remove SelectSyncMode entirely :)

@tmpolaczyk
Copy link
Contributor

Zombienet tests are failing with error:

2024-07-17 15:32:33 [Orchestrator] Failed to start container chain 2002: Other: No genesis data registered for container chain id 2002    

so indeed some registration script probably broke

@tmpolaczyk
Copy link
Contributor

so indeed some registration script probably broke

Actually the registration seems to work fine, it's just the node that fails to read the genesis data for some reason

@nanocryk
Copy link
Contributor Author

It was due to the fact that I was storing an orchestrator block hash in the main structure itself, which is probably the genesis block in that test. Since 2002 is registered on the fly later, the spawner was not able to get the genesis data. I modified (again 💀 ) the chain interface to allow fetching the best or finalized head (the relay interface have that too), which should fix the issue.

Currently I use best head to do like previous code, but it might be better to use the finalized one?

@tmpolaczyk
Copy link
Contributor

Currently I use best head to do like previous code, but it might be better to use the finalized one?

In practice it shouldn't matter because the values we read only change after 1 session, but I guess using finalized is more consistent with the rx_loop function, which reacts on finalized blocks (because otherwise it would need to unspawn a container chain in case of block revert).

@girazoki
Copy link
Collaborator

Currently I use best head to do like previous code, but it might be better to use the finalized one?

In practice it shouldn't matter because the values we read only change after 1 session, but I guess using finalized is more consistent with the rx_loop function, which reacts on finalized blocks (because otherwise it would need to unspawn a container chain in case of block revert).

I did create an issue to change just this, but let's do it as a separate PR

@nanocryk nanocryk merged commit bf28d34 into master Jul 23, 2024
36 checks passed
@nanocryk nanocryk deleted the jeremy-orchestrator-spawner-without-client branch July 23, 2024 09:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a0-pleasereview Pull request needs code review. B5-clientnoteworthy Changes should be mentioned in any downstream projects' release notes B7-runtimenoteworthy Changes should be noted in any runtime-upgrade release notes breaking Needs to be mentioned in breaking changes D3-trivial PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants